132 research outputs found
All that glitters...: Interannotator agreement in natural language processing
Evaluation has emerged as a central concern in natural language processing (NLP) over the last few decades. Evaluation is done against a gold standard, a manually linguistically annotated dataset, which is assumed to provide the ground truth against which the accuracy of the NLP system can be assessed automatically. In this article, some methodological questions in connection with the creation of gold standard datasets are discussed, in particular (non-)expectations of linguistic expertise in annotators and the interannotator agreement measure standardly but unreflectedly used as a kind of quality index of NLP gold standards
Estimating language relationships from a parallel corpus. A study of the Europarl corpus
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), 161-167.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
Dialect classification in the Himalayas: a computational approach
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), 307-310.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/1695
All in the Family: A Comparison of SALDO and WordNet
Proceedings of the NODALIDA 2009 workshop
WordNets and other Lexical Semantic Resources — between Lexical Semantics,
Lexicography, Terminology and Formal Ontologies.
Editors: Bolette Sandford Pedersen, Anna Braasch, Sanni Nimb and
Ruth Vatvedt Fjeld.
NEALT Proceedings Series, Vol. 7 (2009), 7-12.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9209
Synchronic and Diachronic Aspects of Kanashi
Kanashi is a Sino-Tibetan language belonging to the West Himalayish subbranch of this language family. It is spoken by fewer than 2,000 individuals in one single village (Malana in Kullu district, Himachal Pradesh state, India). The book presents an overview of synchronic and diachronic aspects of Kanashi: its sound system, its grammar in outline, its intriguing numeral systems, and word lists (English-Kanashi, Kanashi-English)
Synchronic and Diachronic Aspects of Kanashi
Kanashi is a Sino-Tibetan language belonging to the West Himalayish subbranch of this language family. It is spoken by fewer than 2,000 individuals in one single village (Malana in Kullu district, Himachal Pradesh state, India). The book presents an overview of synchronic and diachronic aspects of Kanashi: its sound system, its grammar in outline, its intriguing numeral systems, and word lists (English-Kanashi, Kanashi-English)
Semantic search in literature as an e-Humanities research tool: CONPLISIT — Consumption patterns and life-style in 19th century Swedish literature
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), 58-65.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
Editor: Donncha O Croinin
Abstract Finnish Romani is a language with a fairly recent written tradition; for all practical purposes it is a 20th century phenomenon. An official orthography was created in 1971, and it is mostly from the 1970's onwards that we see texts of the kind which we normally associate with a written language variety. The text corpus described here is being compiled to support an ongoing investigation into the effects of language contact on Finnish Romani
- …